AITopics | New Braunfels

Collaborating Authors

New Braunfels

Emotion-Guided Image to Music Generation

Kundu, Souraja, Singh, Saket, Iwahori, Yuji

arXiv.org Artificial IntelligenceOct-29-2024

Generating music from images can enhance various applications, including background music for photo slideshows, social media experiences, and video creation. This paper presents an emotion-guided image-to-music generation framework that leverages the Valence-Arousal (VA) emotional space to produce music that aligns with the emotional tone of a given image. Unlike previous models that rely on contrastive learning for emotional consistency, the proposed approach directly integrates a VA loss function to enable accurate emotional alignment. The model employs a CNN-Transformer architecture, featuring pre-trained CNN image feature extractors and three Transformer encoders to capture complex, high-level emotional features from MIDI music. Three Transformer decoders refine these features to generate musically and emotionally consistent MIDI sequences. Experimental results on a newly curated emotionally paired image-MIDI dataset demonstrate the proposed model's superior performance across metrics such as Polyphony Rate, Pitch Entropy, Groove Consistency, and loss convergence.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.22299

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Chūbu (0.04)
North America > United States > Texas > Comal County > New Braunfels (0.04)
(9 more...)

Genre: Research Report (0.50)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Role of Language Models in Modern Healthcare: A Comprehensive Review

Khalid, Amna, Khalid, Ayma, Khalid, Umar

arXiv.org Artificial IntelligenceSep-25-2024

The application of large language models (LLMs) in healthcare has gained significant attention due to their ability to process complex medical data and provide insights for clinical decision-making. These models have demonstrated substantial capabilities in understanding and generating natural language, which is crucial for medical documentation, diagnostics, and patient interaction. This review examines the trajectory of language models from their early stages to the current state-of-the-art LLMs, highlighting their strengths in healthcare applications and discussing challenges such as data privacy, bias, and ethical considerations. The potential of LLMs to enhance healthcare delivery is explored, alongside the necessary steps to ensure their ethical and effective integration into medical practice.

arxiv preprint arxiv, healthcare, language model, (11 more...)

arXiv.org Artificial Intelligence

2409.1686

Country:

North America > United States > Texas > Comal County > New Braunfels (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)
Asia > Pakistan > Punjab > Lahore Division > Lahore (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detecting Cross-Modal Inconsistency to Defend Against Neural Fake News

Tan, Reuben, Saenko, Kate, Plummer, Bryan A.

arXiv.org Artificial IntelligenceSep-22-2020

Large-scale dissemination of disinformation online intended to mislead or deceive the general population is a major societal problem. Rapid progression in image, video, and natural language generative models has only exacerbated this situation and intensified our need for an effective defense mechanism. While existing approaches have been proposed to defend against neural fake news, they are generally constrained to the very limited setting where articles only have text and metadata such as the title and authors. In this paper, we introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions. To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles as well as conduct a series of human user study experiments based on this dataset. In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visual-semantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation.

caption, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2009.07698

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New York > Bronx County > New York City (0.04)
North America > United States > Hawaii (0.04)
(10 more...)

Genre:

Questionnaire & Opinion Survey (0.97)
Research Report > New Finding (0.46)

Industry:

Media > News (1.00)
Leisure & Entertainment > Sports > Baseball (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback